Goto

Collaborating Authors

 Arecibo


100 mystery sounds under review for signs of extraterrestrial life

Popular Science

Over 11 years, citizen scientists collected billions of data signals for the SETI@home project. Breakthroughs, discoveries, and DIY tips sent six days a week. After reviewing almost 30 years of signals, University of California Berkeley researchers have identified 100 mysterious, deep-space radio blips they want to review for signs of extraterrestrial life . And they couldn't have done it without 11 years of volunteer work from millions of PC owners around the world. Even with today's advanced computers, the world's most complex data problems can't be solved by a single machine.


Classification of HI Galaxy Profiles Using Unsupervised Learning and Convolutional Neural Networks: A Comparative Analysis and Methodological Cases of Studies

arXiv.org Artificial Intelligence

Hydrogen, the most abundant element in the universe, is crucial for understanding galaxy formation and evolution. The 21 cm neutral atomic hydrogen - HI spectral line maps the gas kinematics within galaxies, providing key insights into interactions, galactic structure, and star formation processes. With new radio instruments, the volume and complexity of data is increasing. To analyze and classify integrated HI spectral profiles in a efficient way, this work presents a framework that integrates Machine Learning techniques, combining unsupervised methods and CNNs. To this end, we apply our framework to a selected subsample of 318 spectral HI profiles of the CIG and 30.780 profiles from the Arecibo Legacy Fast ALFA Survey catalogue. Data pre-processing involved the Busyfit package and iterative fitting with polynomial, Gaussian, and double-Lorentzian models. Clustering methods, including K-means, spectral clustering, DBSCAN, and agglomerative clustering, were used for feature extraction and to bootstrap classification we applied K-NN, SVM, and Random Forest classifiers, optimizing accuracy with CNN. Additionally, we introduced a 2D model of the profiles to enhance classification by adding dimensionality to the data. Three 2D models were generated based on transformations and normalised versions to quantify the level of asymmetry. These methods were tested in a previous analytical classification study conducted by the Analysis of the Interstellar Medium in Isolated Galaxies group. This approach enhances classification accuracy and aims to establish a methodology that could be applied to data analysis in future surveys conducted with the Square Kilometre Array (SKA), currently under construction. All materials, code, and models have been made publicly available in an open-access repository, adhering to FAIR principles.


Scientists analyse the famous 'WOW!' signal first detected in 1977 - and finally reveal the truth about the mysterious flash

Daily Mail - Science & tech

In 1977, the Ohio State University's Big Ear radio telescope captured a signal from space so strange that scientists are still baffled by it almost 50 years later. For decades, scientists have struggled to find any natural process capable of producing the 72-second burst which prompted astronomer Jerry Ehman to write'WOW!' on the telescope's readout. Now, new analysis of the so-called WOW! signal has revealed that it might have been caused by a hugely powerful laser slamming into Earth. Experts say this was not the first salvo of an alien invasion, but rather the entirely natural product of a rare alignment between a collapsed star and a cloud of cool hydrogen. Unfortunately for alien-hunters, scientists from the University of Puerto Rico at Arecibo say this new evidence shows that the WOW! signal is not evidence of life beyond Earth.


Why haven't aliens contacted us? Scientists reveal their theories for the lack of any signs from extraterrestrials - despite '100% chance' that they exist

Daily Mail - Science & tech

Despite what UFO enthusiasts might claim, virtually every scientist agrees that humanity is yet to receive a message let alone a visitor from beyond our planet. But in the vast scale of the universe – containing an estimated 2 trillion galaxies – scientists say there is a '100 per cent chance' that there is life somewhere apart from Earth. This raises a intriguing question: If alien life truly is common in the Universe, why haven't we heard from them? From the'Dark Forest Hypothesis' to the inevitability of nuclear war, the answer to this question may offer a chilling glimpse into the future of our own civilisation. Professor Frederick Walter, a galactic astronomer from Stony Brook University says: 'Life is a biochemical process, it's going to happen, but as you go further down the chain things become more uncertain.'


AstroPT: Scaling Large Observation Models for Astronomy

arXiv.org Artificial Intelligence

This work presents AstroPT, an autoregressive pretrained transformer developed with astronomical use-cases in mind. The AstroPT models presented here have been pretrained on 8.6 million $512 \times 512$ pixel $grz$-band galaxy postage stamp observations from the DESI Legacy Survey DR8. We train a selection of foundation models of increasing size from 1 million to 2.1 billion parameters, and find that AstroPT follows a similar saturating log-log scaling law to textual models. We also find that the models' performances on downstream tasks as measured by linear probing improves with model size up to the model parameter saturation point. We believe that collaborative community development paves the best route towards realising an open source `Large Observation Model' -- a model trained on data taken from the observational sciences at the scale seen in natural language processing. To this end, we release the source code, weights, and dataset for AstroPT under the MIT license, and invite potential collaborators to join us in collectively building and researching these models.


Decoding Geometric Properties in Non-Random Data from First Information-Theoretic Principles

arXiv.org Artificial Intelligence

Based on the principles of information theory, measure theory, and theoretical computer science, we introduce a univariate signal deconvolution method with a wide range of applications to coding theory, particularly in zero-knowledge one-way communication channels, such as in deciphering messages from unknown generating sources about which no prior knowledge is available and to which no return message can be sent. Our multidimensional space reconstruction method from an arbitrary received signal is proven to be agnostic vis-a-vis the encoding-decoding scheme, computation model, programming language, formal theory, the computable (or semi-computable) method of approximation to algorithmic complexity, and any arbitrarily chosen (computable) probability measure of the events. The method derives from the principles of an approach to Artificial General Intelligence capable of building a general-purpose model of models independent of any arbitrarily assumed prior probability distribution. We argue that this optimal and universal method of decoding non-random data has applications to signal processing, causal deconvolution, topological and geometric properties encoding, cryptography, and bio- and technosignature detection.


WIP: A Unit Testing Framework for Self-Guided Personalized Online Robotics Learning

arXiv.org Artificial Intelligence

Our ongoing development and deployment of an online robotics education platform highlighted a gap in providing an interactive, feedback-rich learning environment essential for mastering programming concepts in robotics, which they were not getting with the traditional code-simulate-turn in workflow. Since teaching resources are limited, students would benefit from feedback in real-time to find and fix their mistakes in the programming assignments. To address these concerns, this paper will focus on creating a system for unit testing while integrating it into the course workflow. We facilitate this real-time feedback by including unit testing in the design of programming assignments so students can understand and fix their errors on their own and without the prior help of instructors/TAs serving as a bottleneck. In line with the framework's personalized student-centered approach, this method makes it easier for students to revise, and debug their programming work, encouraging hands-on learning. The course workflow updated to include unit tests will strengthen the learning environment and make it more interactive so that students can learn how to program robots in a self-guided fashion.


A geometric framework for interstellar discourse on fundamental physical structures

arXiv.org Artificial Intelligence

This paper considers the possibility that abstract thinking and advanced synthesis skills might encourage extraterrestrial civilizations to accept communication with mankind on Earth. For this purpose, a notation not relying upon the use of alphabet and numbers is proposed, in order to denote just some basic geometric structures of current physical theories: vector fields, one-form fields, and tensor fields of arbitrary order. An advanced civilization might appreciate the way here proposed to achieve a concise description of electromagnetism and general relativity, and hence it might accept the challenge of responding to our signals. The abstract symbols introduced in this paper to describe the basic structures of physical theories are encoded into black and white bitmap images that can be easily converted into short bit sequences and modulated on a carrier wave for radio transmission.


Computing Transiting Exoplanet Parameters with 1D Convolutional Neural Networks

arXiv.org Artificial Intelligence

The transit method allows the detection and characterization of planetary systems by analyzing stellar light curves. Convolutional neural networks appear to offer a viable solution for automating these analyses. In this research, two 1D convolutional neural network models, which work with simulated light curves in which transit-like signals were injected, are presented. One model operates on complete light curves and estimates the orbital period, and the other one operates on phase-folded light curves and estimates the semimajor axis of the orbit and the square of the planet-to-star radius ratio. Both models were tested on real data from TESS light curves with confirmed planets to ensure that they are able to work with real data. The results obtained show that 1D CNNs are able to characterize transiting exoplanets from their host star's detrended light curve and, furthermore, reducing both the required time and computational costs compared with the current detection and characterization algorithms.


IDEAL: Influence-Driven Selective Annotations Empower In-Context Learners in Large Language Models

arXiv.org Artificial Intelligence

In-context learning is a promising paradigm that utilizes in-context examples as prompts for the predictions of large language models. These prompts are crucial for achieving strong performance. However, since the prompts need to be sampled from a large volume of annotated examples, finding the right prompt may result in high annotation costs. To address this challenge, this paper introduces an influence-driven selective annotation method that aims to minimize annotation costs while improving the quality of in-context examples. The essence of our method is to select a pivotal subset from a large-scale unlabeled data pool to annotate for the subsequent sampling of prompts. Specifically, a directed graph is first constructed to represent unlabeled data. Afterward, the influence of candidate unlabeled subsets is quantified with a diffusion process. A simple yet effective greedy algorithm for unlabeled data selection is lastly introduced. It iteratively selects the data if it provides a maximum marginal gain with respect to quantified influence. Compared with previous efforts on selective annotations, our influencedriven method works in an end-to-end manner, avoids an intractable explicit balance between data diversity and representativeness, and enjoys theoretical support. Experiments confirm the superiority of the proposed method on various benchmarks, achieving better performance under lower time consumption during subset selection. The project page is available at https://skzhang1.github.io/IDEAL/. In-context learning (ICL) entails presenting a small set of examples with demonstrations as prompts (called in-context examples) to large language models (LLMs), before making predictions on test inputs (Wei et al., 2022a; Min et al., 2022; Akyürek et al., 2023). This emerging few-shot learning paradigm is an appealing alternative to supervised fine-tuning as it can avoid heavy parameter updates of language models while improving accuracy (Liu et al., 2021; Yoo et al., 2022). Recent studies indicate that obtaining prompts from a vast collection of annotated examples is crucial to achieving strong performance (Rubin et al., 2022). Notably, these studies have illuminated the substantial performance improvements when retrieving analogous examples (under specific embedding criteria) as in-context examples tailored for each individual test input.